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Abstract 


We present formal models and results of formal analysis of two different asynchronous systems. We first 
examine a mid-value select module that merges the signals coming from three different sensors that are each 
asynchronously sampling the same input signal. We then consider the phase locking protocol proposed by 
Daly, Hopkins, and McKenna. This protocol is designed to keep a set of non-faulty (asynchronous) clocks 
phase locked even in the presence of Byzantine-faulty clocks on the network. All models and verifications 
have been developed using the SAL model checking tools and the Hybrid SAL abstractor. 
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1 Introduction 


Flight-critical control systems rely on redundant channels to tolerate failures in hardware and softw are com- 
ponents. Properly coordinating and managing redundancy in the presence of faults is critical to achieving 
high reliability. Two general classes of fault-tolerant architectures arc commonly used for this purpose: 

• Synchronous architectures rely on a common time base that enables the redundant channels to work 
synchronously. Once such a synchronous time base is established, different processors can work as 
exact deterministic replicas of each other. Fault-tolerant consensus algorithms ensure that all non- 
faulty replicas perform the same computation on the same input, and thus agree on the control output. 

• Asynchronous architectures do not attempt to synchronize the different channels. Each processing 
component is driven by its own local clock. Clocks can be out of phase and drift from each other. 
As a consequence, the control processors sample sensors and other input at slightly different points 
in time, which prevents them from reaching exact agreement. The control signals produced by dif- 
ferent channels cannot be guaranteed to be equal. Voting and equalization logic is then employed to 
compensate for the difference between these control signals. 

We examine two example systems that belong to each of these two classes of architectures. Both systems 
arc asynchronous and arc modeled formally using the SAL and Hybrid SAL tools [2,5,6]. The resulting 
SAL models are available at http : //www . csl . sri . com/users/bruno/vvf cs . html. 

• Our first example is a mid-value select module that merges the signals coming from three different 
sensors that arc each asynchronously sampling the same input signal. This module is paid of a larger 
control system with an asynchronous architecture. It combines signals from three asynchronous com- 
ponents into a single output. 

• Our second example is the clock phase locking protocol proposed by Daly, Hopkins, and McKenna [1]. 
This protocol keeps a set of hardware clocks in phase lock, even if some of the clocks in the system 
fail. This protocol is one of the first published protocol designed to tolerate components that fail in an 
arbitrary, possibly asymmetric manner, that is, what we now call Byzantine failures. 

The goal of the work presented here is to evaluate the possible mechanisms for modeling and analyzing 
different kinds of asynchronous systems. Lor the two case studies listed above, we present two different 
approaches for modeling asynchronous systems: 

• In the first approach, asynchronous sampling is modeled as nondeterministic selection from a set of 
possible sample values; that is, different asynchronously executing components will each nondeter- 
ministically select (possibly different) values from a set of values. 

• In the second approach, asynchronous effects are captured as synchronous events by performing a 
time abstraction; that is, events that occur at different, but very close, time instants arc fired at the 
same abstract time, and the abstract time is advanced when no asynchronous event happens. 

We provide more details below in the two sections dedicated to each of the two case studies. 


2 Robust Asynchronous Control 

The asynchronous case study consists of a control system that contains redundant controllers and redundant 
actuator interface units for tolerating faults. The system is asynchronous. All components, including con- 
trollers and actuator interface units, have distinct clocks, which may be out of phase and drift from each 
other. The system does not include any clock synchronization mechanism. 
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mvs(xl, x2, x3, bl, b2, b3, x) : 

// xl,x2,x3 are the three sensor values, 

// bl,b2,b3 are the corresponding valid bits, 
// x is the previous output value 


if 

bl is 

False 

then 

xl 

= X 

endif 

if 

b2 is 

False 

then 

x2 

= X 

endif 

if 

b3 is 

False 

then 

x3 

= X 

endif 

if 

(not 

bl and 

not b2) 

then 


newx 

= x3 





else if 

(not bl 

and 

not 

b3 ) 

then 


newx 

= x2 





else if 

(not b2 

and 

not 

b3 ) 

then 


nexx 

= vl 





else 







newx = (xl + x2 + x3) - max (xl , x2 , x3 ) - min (xl , x2 , x3 ) 
return newx 


Figure 1 . Pseudocode for the Mid- Value Select Component 


The key challenge in analyzing such systems arises from the asynchronous sampling and selection em- 
ployed by the redundant controllers and actuators. One of the main components of the asynchronous control 
case study is the mid-value select module that is used to collect and merge values coming from identical, 
but asynchronously executing, components. In this section, we present the mid-value select algorithm, its 
formalization in SAL, and then the results of formally analyzing it against a collection of requirements. We 
note here that we only formalize a small subsystem of the whole asynchronous case study. 

2.1 Mid- Value Select 

The pseudocode for the mid- value select (MVS) algorithm is given in Figure 1 . The algorithm takes three 
input values, x\, X2, X3 with corresponding validity bits bi, 62, Ls , and produces an output newx. If bit b t 
is true then input Xi is assumed valid. Otherwise, x % is assumed faulty and it is not taken into account in 
the computation. The inputs x\,X2,X3 to the mid- value select algorithm arc obtained by asynchronously 
sampling values of some continuous signal. If all three validity bits arc true, the mid-value select algorithm 
returns the middle value of the three inputs. If one 6* is false, then the corresponding x % is ignored; it is 
replaced by the mid- value x computed in the previous iteration. If two inputs are invalid, then the remaining 
good value is forwarded as the output. 

We now describe the process of formal modeling the above system. The continuous signal can be the 
output of some plant model described using ordinary differential equations. The mid- value select component 
is modeled using a regular (discrete) SAL module. The interesting question that we answer below is how 
to link the continuous plant and the discrete mid-value select module that takes as input three different 
asynchronous samples of some output signal produced by the plant. 

Figure 2 presents a Hybrid SAL model of the complete system. The final system model is a synchronous 
composition of three modules called plant, mvs, and asynchronousBus. We assume that the plant 
module outputs a continuous sinusoidal signal y defined by the system of ordinary differential equations 

dx 

di = V 

dy 
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MVS : CONTEXT = 

BEGIN 

faulty: BOOLEAN = FALSE ; 

reallyf aulty : BOOLEAN = FALSE ; 

dt: REAL =0.2 ; 

e: REAL = 0.05 ; 

midval(yl: REAL, y2 : REAL, y3 : REAL) : REAL = as in Figure 3 

plant : MODULE = 

BEGIN 

OUTPUT x, y, ylb, yub : REAL 

INITIALIZATION x = 1; y = 0; 

TRANSITION [ TRUE — > xdot' = y; ydot' = -x ] 

END; 


mvs : MODULE = as in Figure 3 

asynchronousBus : MODULE = 

BEGIN 

INPUT ylb, yub: REAL 

OUTPUT xl, x2 , x3 : REAL 

OUTPUT bl, b2 , b3 : BOOLEAN 

INITIALIZATION 

xl = yub; x2 = yub; x3 = yub; 
bl = TRUE; 

b2 = NOT ( reallyf aulty ) ; 
b3 = NOT (faulty) 

TRANSITION 


TRUE 

— > 

■ % faulty 

or not 

choose 

xl , x2 , x3 

correctly 

xl' 

IN 

{ z : REAL | 

(ylb' 

< = 

z 

AND 

z 

<= yub' 

) OR 




(ylb' 

> = 

z 

AND 

z 

>= yub' 

)}; 

x2 ' 

IN 

{ z : REAL | 

(ylb' 

< = 

z 

AND 

z 

<= yub' 

) OR 




(ylb' 

> = 

z 

AND 

z 

>= yub' 

)}; 

x3' 

IN 

{ z : REAL | 

(ylb' 

< = 

z 

AND 

z 

<= yub' 

) OR 




(ylb' 

> = 

z 

AND 

z 

>= yub' 

)}; 


] 

END; 

system: MODULE = asynchronousBus | | plant | | mvs ; 

END 


Figure 2. Hybrid SAL Model of the Mid- Value Select Component 
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midval (yl : REAL, y2 : REAL, y3 : REAL): REAL = 

IF yl <= y2 THEN 

(IF y2 <= y3 THEN y2 ELSIF yl <= y3 THEN y3 ELSE yl ENDIF) 
ELSE 

(IF yl <= y3 THEN yl ELSIF y2 <= y3 THEN y3 ELSE y2 ENDIF) 
ENDIF; 


mvs : MODULE = 


BEGIN 

INPUT xl, x2 , x3 : REAL 

INPUT bl, b2 , b3 : BOOLEAN 

OUTPUT x: REAL 

INITIALIZATION x = 0 
TRANSITION 
[ TRUE — > 

x' = midval ( 

IF bl' THEN xl' ELSIF NOT(b2') 
ELSIF NOT (b3' ) THEN x2' ELSE x 
IF b2 ' THEN x2' ELSIF NOT(bl') 
ELSIF NOT (b3' ) THEN xl' ELSE x 
IF b3 ' THEN x3' ELSIF NOT(bl') 
ELSIF NOT (b2 ' ) THEN xl' ELSE x 


THEN x3 ' 
ENDIF, 
THEN x3 ' 
ENDIF, 
THEN x2 ' 
ENDIF) 


END; 


Figure 3. The Mid- Value Select Algorithm in SAL 


This is enough for illustrating the analysis approach, but a more complex model could be substituted for the 
plant module of Figure 2. 

The Hybrid SAL model uses Boolean constants faulty and reallyfaulty that can be set to true 
or false to explore different fault scenarios. If both constants are false, then there arc no faults in the system. 
If faulty is true, then one sensor value (namely, X3) is faulty. And if both faulty and reallyfaulty 
arc true, then the two sensor values xy and xy are assumed faulty. 

The mvs module takes three input values x\ to xy, and three valid bits b\ to 63, and it outputs one value 
computed according to the algorithm described in Figure 1. The corresponding SAL module is shown in Fig- 
ure 3 and it uses the midval function that is shown in the same figure. Finally, the asynchronousBus 
module takes in the values produced by the plant and uses them to compute the inputs for the mvs. To 
model the asynchronous sampling, we model the output from the plant not as a single real- valued signal, but 
as an interval [ylb, yub\. This interval is defined as the set of values that the output y can take in the time 
interval [dt — e, dt], where dt and e arc fixed parameters defined in the SAL context. 

Thus, the idea for modeling asynchronous systems consists of modeling the plant so that it generates a 
set of states (such as an interval) as its output. The asynchronousBus then takes the set and nondeter- 
ministically selects multiple values from this set. This is our model for asynchronous sampling. 

2.2 Discrete Abstraction 

In the plant module of Figure 2, we have not shown how the plant computes ylb and yub. This is be- 
cause the expression for ylb and yub will be derived by the Hybrid SAL relational abstractor automatically 
from the given (ordinary differential equation) dynamics of the variables x and y, and the time constants dt 
and e. Our current version of the Hybrid SAL relational abstractor does not yet support this feature. We 
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plan to add this feature to the Hybrid SAL relational abstractor in the next few months. 

On this specific example, we intend the abstractor to produce an abstract discrete model similar to the one 
shown in Figure 4. This discrete model was constructed by hand, using the relational-abstraction technique 
described in [6], adjusted to compute constraints on the variables ylb and yub rather than on a single value 
y. The constraints on ylb and yub are derived by letting variables x and y evolve for a time interval of 
length dt (from their current values) according to the differential equations, and recording the value of y at 
both the end of this interval (i.e., at time dt) and at the intermediate point dt — e. The abstraction introduces 
an auxiliary Boolean variable inc that indicates whether y is increasing or decreasing at the beginning of 
the interval [0, dt}. 

In addition to the abstracted plant. Figure 4 contains an observer module called deadzoneMonitor. 
This module detects deadzones, that is, situations during which the input changes, but the output does 
not. If this behavior is observed, then the deadzoneMonitor module sets the flag variable to true. 
Composing this observer with the system module lets us check for deadzone behavior in the model. 

Every state transition of the full system monitor of Figure 4 consists of the following three steps: 

• First, the plant module updates its outputs. The plant module computes the lower and upper- 
bound for the variable y reached in the time interval \dt — e, dt]. 

• The asynchronousBus module then reads in the new lower- and upper-bounds for y and non- 
deterministically chooses three values from that range. This models asynchronous sampling. The 
three values are output by the asynchronousBus module. Note the asynchronousBus module 
reacts to the primed forms of the inputs, indicating that it uses the updated values for ylb and yub 
(generated in the first step). 

• Finally, the mvs module applies the mid- value select computation to the three values it gets the 
asynchronousBus. The result is the output variable x. 

This shows how we model asynchronous sampling using SAL synchronous composition of base modules. 

2.3 Properties and Analysis 

The SAL model of the asynchronous system is an infinite-state model and it is not possible to verify it 
using exhaustive state space exploration. It can only be analyzed using the SAL bounded model checker for 
infinite systems (sal-inf-bmc). All proofs use sal-inf-bmc as a k-induction prover (i.e.. by giving 
option -i) to sal-inf-bmc). 

In Ligure 5, we have briefly enumerated some properties that can be formally checked against the models 
system and system monitor. Property pO states that the output of the mid-value select is not too far 
off from the actual value of the signal. This property can be verified for all fault scenarios. It is proved by 
invoking sal-inf-bmc as follows 

sal-inf-bmc -i -d 1 MVS. sal pO 

This calls sal-inf-bmc to check property pO of a SAL context contained in file MVS . sal using k- 
induction. Lor this property, induction depth of 1 is sufficient (i.e., k=l). This is specified by giving option 
-d 1 on the command line. 

Properties 11, 12, 13 arc listed just because they are sometimes useful as auxiliary lemmas for 
proving other properties by k-induction. Lemma 1 3 is provable in all fault scenarios in the same way as 
property pO (i.e., by induction at depth 1). Lemma 11 is provable by the same method if there are no faults. 
Lemma 12 is provable if there is a single fault. Property p2 is false and is included just to debug the model. 
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MVS : CONTEXT = 

BEGIN 

global parameter declarations as in Figure 2 

midval(yl: REAL, y2 : REAL, y3 : REAL) : REAL = as in Figure 2 

plant : MODULE = 

BEGIN 

OUTPUT yub, ylb: REAL 

LOCAL inc : BOOLEAN 

INITIALIZATION 

yub = 0; ylb = 0; inc = TRUE; 

TRANSITION 

[ yub>= -0 . 8 AND yub <= 0.8 AND inc — > 

yub' = yub + dt * 1 ; ylb' = yub + (dt - e) * 1; 

[] 

yub >= -0.8 AND yub <=0.8 AND NOT (inc) — > 

yub' = yub + dt * (-1) ; ylb' = yub + (dt - e) * (-1) ; 

[] 

((yub <= -0.8 AND yub >= -1) OR yub >= 0.8) AND NOT (inc) — > 
yub' = yub + dt * (-0.1) ; ylb' = yub + (dt - e) * (-0.1); 

inc' = IF yub' <= -1 THEN TRUE ELSE FALSE ENDIF ; 

[] 

((yub >= 0.8 AND yub <= 1) OR yub <= -0.8) AND inc — > 

yub' = yub + dt * (0.1) ; ylb' = yub + (dt-e) * (0.1) ; 

inc' = IF yub' >= 1 THEN FALSE ELSE TRUE ENDIF ; 

] 

END; 

mvs : MODULE = as in Figure 2 

asynchronousBus : MODULE = as in Figure 2 

system: MODULE = asynchronousBus | | plant | I mvs ; 

deadzoneMonitor : MODULE = 

BEGIN 

INPUT x, yub: REAL 

OUTPUT flag: BOOLEAN 

INITIALIZATION flag = FALSE 
TRANSITION 

[ NOT (yub = yub' ) --> flag' = (x = x' ) 

[] ELSE — > 

] 

END; 

systemunonitor : MODULE = system | | deadzoneMonitor ; 


Figure 4. Abstracted SAL Model of the Mid- Value Select Component with Observer 



pO : THEOREM 

systemjnonitor | - G((x >= yub AND x - yub <= 0.05) OR 

(x <= yub AND yub - x <= 0.05)); 


11 : 

THEOREM 

systemjnonitor 

1- G 

12 : 

THEOREM 

systemjnonitor 

1- G 

13 : 

THEOREM 

systemjnonitor 

1- G 

pi: 

THEOREM 

systemjnonitor 

1- G 

p2 : 

THEOREM 

systemjnonitor 

1- G 


( bl AND b2 AND b3 ) ; 

( bl AND b2 AND NOT(b3) ); 
(-1.02 <= yub AND yub <= 1.02); 
( flag = FALSE ) ; 

( inc ) ; 


Figure 5. Correctness Properties for the Mid- Value Select Component 


Property pi says that flag is never true, which means that there is no instance of deadzone behavior. 
Changing the value of parameters dt and e influences the validity of this property. When e is small, 
then there is no deadzone behavior since a small value of e indicates more “synchronization” between the 
three elements. When e is large, then deadzones can be observed even in the absence of faults. Perhaps 
surprisingly, the behavior of this model is not affected much by the presence or absence of faults. For 
example, if e is set to 0.05 and dt is 0 . 2 then no deadzone is observed even if one or two sensors arc 
faulty. In all fault scenarios, property pi is provable by induction at depth k = 2. On the other hand, for the 
same value of d, if e is increased to 0 . 1, then deadzone can be observed even in the fault-free case. 
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3 Draper Phase Locking Protocol 


We now discuss our SAL model of the fault-tolerant clock system proposed by William M. Daly, Albert L. 
Hopkins, Jr., and John F. McKenna from MIT Draper Labs [1], This system consists of an array of identical 
clock elements that communicate with each other to produce a number of phase-locked signals. If less than 
a third of these clock elements arc faulty, then all non-faulty outputs are in phase lock. This property holds 
even if the faulty components arc asymmetric, that is, even in the presence of Byzantine faults. 

We have developed several SAL models of this system using the same modeling approach. A previous 
model (available at http : / /www . csl . sri . com/ users/bruno/vvf cs . html) encoded the rules 
of the protocol at an abstract level. Using this model, we concluded that the protocol can maintain clocks in 
phase locks even in the presence of Byzantine faults. In other words, if the clocks arc initially in phase lock, 
they will remain phase locked, even in the presence of Byzantine fault. However, in this model, the algorithm 
did not ensure that phase lock is reached from arbitrary initial configurations. There were scenarios in which 
a Byzantine clock caused the good clocks to form two disjoint cliques running in opposite phase. 

We have refined this initial SAL model by modeling more precisely the hardware circuit of the clock 
elements presented in the paper by Daly et al. [1]. Using this refined model, we can prove that good clocks 
do eventually synchronize, from any initial configuration, under the assumption that the faulty clock does 
not change more than once at a given time instant, i.e., it is not allowed to change too fast. In other words, 
we need to add a test in each clock element that checks to make sure that other clock elements do not send 
it different values too frequently. 

We now describe the modeling and analysis results in more detail. 

3.1 Draper Clock Phase Locking 

The Draper clocking system consists of n identical clock elements that communicate with each other to 
produce a number of phase-locked clock signals. If n > 3/ + 1, then the clocking system can tolerate / 
Byzantine faults and maintain the 2/ + 1 good clocks in phase-locked state. 

Since the clock elements execute asynchronously, the phase-locked condition is defined as follows: two 
clocks are phase-locked to within t s time units if state changes in the two clocks occur within t s time units 
of each other. The parameter t s is a function of the communication delay between the clock elements and is 
assumed to be significantly smaller than the time period of the clocks. 

Each individual clock element receives the state of the other clocks. It uses this information to compute 
various quorum functions on the clock states. Based on the values of the computed quorum functions, the 
clock element possibly triggers one-shot monostable multivibrators, which then set the value of the output 
clock via two flip flops. 

In this asynchronous system, state changes happen at two different time scales: 

• A fast time scale where multiple state changes happen in a relatively short time span (microsteps). 

• A slow time scale where no state change occurs until some timer expires (macrostep), which then 
initiates a cascade of microsteps. 

3.2 Formalization in SAL 

We formalize the Draper clocking system by first abstracting time. Each clock element in the original system 
has two one-shot multivibrators, called zlO and zll, with time period of 860 ns and 200 ns respectively. It 
also has two flip-flops, zl2a and zl2b, where the value stored in flip-flop zl2a is the current state of the 
clock. Thus, the state of a clock element consists of 

(a) the Boolean values stored in the two flip-flops and 
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N: NATURAL =3; % number of NON-FAULTY nodes 

f: NATURAL = 1; % number of faulty nodes 

Node : TYPE = [1 . . N] ; 

tMax: NATURAL = 11; % time period of zlO 
tMin: NATURAL =3; % time period of zll 

Time: TYPE = [-1 .. tMax]; 

min (x, y : ARRAY [ 1 . . N] OF Time, i : [ 0 . . N] , a : Time) : Time = 

IF i = N AND a = tMax THEN 0 
ELSIF i = N THEN a 

ELSIF (x [ i+1 ] < y [ i+1 ] OR y[i+l] = -1) AND x[i+l] < a AND x[i+l] > 0 
THEN min (x, y, i+1, x[i+l]) 

ELSIF (x [ i+1 ] >= y [i+1 ] OR x[i+l] = -1) AND y[i+l] < a AND y[i+l] > 0 
THEN min (x, y, i+1, y[i+l]) 

ELSE min (x, y, i+1, a) ENDIF ; 

sum (x : ARRAY [1..N] OF BOOLEAN, i: [0..N], a: [0..N]): [0..N] = 

IF i = N THEN a ELSIF x[i+l] THEN sum(x, i+1, a+1) 

ELSE sum (x, i+1, a) ENDIF; 


Figure 6. Constants and functions used in the Draper Clock Phase Locking SAL Model 

(b) the time remaining before each one-shot will return to its stable state (i.e. before it will output a falling 
edge). 

If zlO or zll are in their stable state (i.e., they have not been triggered recently), then we set the time 
remaining to -1. Since the time periods are 860 ns and 200 ns, hence, at any instant, the time remaining 
before zlO and zl 1 fire will either be a value between 0 and 860 ns, or it will be —1. 

In the formal model of a clock element, we replace the continuous time range by the integer interval 
[— 1 . . . tMax]. In our experiments, tMax was fixed to 11 time units and the time periods of the two one- 
shots was fixed to 11 and 3 respectively. As seen in Figure 7, the state of each clock element in our model 
consists of: 

• the state of the clock, cO, which is a Boolean-valued variable (stored in flip-flop zl2a) 

• the Boolean value stored in the flip-flop z 12b 

• the time remaining in the two one-shots, zlO and zll, which are values in the integer interval 
[— 1 . . . tMax ] 

The SAL model is parameterized by N, the number of good clocks, and /, the number of faulty clocks. 
Each clock element receives as input the values of two quorum functions qni and qf i, which arc computed 
based on the values of all the clock elements. We denote by Q(b, a) the quorum function that has value 1 if a 
or more of its b inputs arc 1, and has value 0 otherwise. Then, qf i is the quorum function Q(N + /, / + 1) 
applied to the N + / clocks in the system, and qni is the quorum function Q(N + /, N) of the same N + / 
clocks. In other words, qf i is 1 if at least / + 1 of the clocks arc 1 and qni is 1 if at least N clocks arc 1. 

Each clock element also receives another input, namely timeAdvance, which is a value in the range 
[0 . . . tMax\, that indicates how much time has elapsed since the last state transition. 

The Draper clocking system model is obtained by composing the clock element models. There is one 
other module, called calculator, that computes the quorum functions and the value of the timeAdvance 


11 



element [i: Node] : MODULE = 

BEGIN 

OUTPUT zlOi, zlli : Time 

LOCAL zl2b : BOOLEAN 

OUTPUT cO : BOOLEAN 

INPUT qni, qf i : BOOLEAN 

INPUT timeAdvance: Time 

INITIALIZATION 
zlOi = tMax - i; 
zlli = -1; 
zl2b = TRUE 
TRANSITION 
[ 

(NOT (qni) AND qni' AND NOT(qfi) AND qfi') OR 
(qf i AND NOT (qfi') AND qni AND NOT (qni')) — > 
zlOi' = tMax-1 ; 

zlli' = IF zlli = -1 THEN tMin ELSE zlli END IF ; 
cO' = IF zlli = -1 THEN zl2b ELSE cO ENDIF 

[] 

((NOT (qni) AND qni') OR (qfi AND NOT (qfi' ) ) ) AND 
NOT ( (qni AND NOT (qni' ) ) OR (NOT (qfi) AND qfi')) — > 
zlOi' = tMax-1 

[] 

NOT ( (NOT (qni) AND qni') OR (qfi AND NOT(qfi'))) AND 
( (qni AND NOT (qni' ) ) OR (NOT (qfi) AND qfi')) — > 
zlli' = IF zlli = -1 THEN tMin ELSE zlli ENDIF; 
zlli' = IF zlli = -1 THEN tMin ELSE zlli ENDIF; 
cO' = IF zlli = -1 THEN zl2b ELSE cO ENDIF 

[] 

See remaining transitions in Figure 8 


Figure 7. Model of an Individual Clock Element in SAL (Paid 1) 
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element [i: Node] : MODULE = 

BEGIN 

TRANSITION 

continuing from Figure 7 

[] 

qni = qni' AND qfi = qfi' AND timeAdvance' > 0 AND 
zlOi = timeAdvance' AND NOT(zlli = timeAdvance') — > 
zlOi' = tMax-1; 

zlli' = IF zlli = -1 THEN tMin ELSE zlli - timeAdvance' ENDIF 
cO' = IF zlli = -1 THEN zl2b ELSE cO ENDIF 

[] 

qni = qni' AND qfi = qfi' AND timeAdvance' > 0 AND 
zlOi = timeAdvance' AND zlli = timeAdvance' — > 
zlOi' = tMax-1 ; 
zlli' = -1 ; 
zl2b' = NOT (qni) ; 

[] 

qni = qni' AND qfi = qfi' AND timeAdvance' > 0 AND 
zlli = timeAdvance' AND NOT(zlOi = timeAdvance') — > 

zlOi' = IF zlOi = -1 THEN -1 ELSE zlOi - timeAdvance' ENDIF; 

zlli' = -1 ; 

zl2b' = NOT (qni) ; 

[] 

qni = qni' AND qfi = qfi' AND timeAdvance' > 0 AND 
NOT (zlli = timeAdvance') AND NOT(zlOi = timeAdvance') — > 
zlOi' = IF zlOi > 0 THEN zlOi - timeAdvance' ELSE -1 ENDIF; 
zlli' = IF zlli > 0 THEN zlli - timeAdvance' ELSE -1 ENDIF; 

[] 

qni = qni' AND qfi = qfi' AND timeAdvance' = 0 — > 

] 

END; 


Figure 8. Model of an Individual Clock Element in SAL (Paid 2) 



draperClockv8 : CONTEXT = 

BEGIN 

constant and function declarations from Figure 6 
calculator: MODULE = 


BEGIN 

INPUT c: ARRAY [1 .. N] OF BOOLEAN 

INPUT zlO, z 1 1 : ARRAY [1 .. N] OF Time 

OUTPUT qnmf , qfpl : ARRAY [1 .. N] of BOOLEAN 

OUTPUT timeAdvance: Time 


LOCAL smin: 

[1 ■ 

. N] 


LOCAL faultc: ARRAY Node OF [ 0 . . 1] 

INITIALIZATION 



faultc = 

[ [i:Node] 

0] ; 

qnmf = [ 

[i: [1. 

• N] ] 

FALSE ] ; 

qfpl = [ 

[i: [1. 

• N] ] 

FALSE ] ; 

timeAdvance = 0 



DEFINITION 

smin = 

sum(c, 0, 0) ; 

TRANSITION 

r 




L 

timeAdvance 

<= 0 • 

— > 


qnmf' = [ 

[1: [1 

. -N] ] 

smin+faultc [ i ] >= 

qfpl' = [ 

[1: [1 

. -N] ] 

smin+faultc [ i ] >= 


timeAdvance' = IF FORALL (i:[l..N]): (qnmf[i] = qnmf'fi] 

qfpl [i] = qfpl' [i] 
THEN min (zlO, zll, 0, tMax) ELSE 0 END IF 

[] 


timeAdvance > 0 — > 

faultc' IN {a: ARRAY Node OF [0..1] 

qnmf' = [ [i: [1 . .N] ] smin+ faultc' [i] 

qfpl' = [ [i: [1. .N] ] smin+faultc' [i] 

timeAdvance' = IF FORALL (i: [1. .N] ) : 

THEN min (zlO, zll, 0, 


I TRUE } ; 

>= N ] ; 

>= f+! ] ; 

(qnmf [i] = qnmf' [i] 
qfpl [i] = qfpl' [1] 
tMax) ELSE 0 END IF 


END; 

element [i: Node] : MODULE = from Figure 7 

continued in Figure 10 


Figure 9. Draper Clock Phase Locking in SAL (Part 1) 


AND 

) 


AND 

) 
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draperClockv8 : CONTEXT = 

BEGIN 


first part of model in Figure 9 
system: MODULE = calculator | 

( WITH OUTPUT c: ARRAY Node OF BOOLEAN 


WITH INPUT qnmf 
WITH INPUT qfpl 
WITH OUTPUT zlO 
WITH OUTPUT z 1 1 
(II (i : Node) : 


ARRAY Node OF BOOLEAN 
ARRAY Node OF BOOLEAN 
ARRAY Node OF Time 
ARRAY Node OF Time 
RENAME cO TO c[i], qni 


qfi TO qfpl[i], zlOi 
zlli TO zll [i] 

IN element [i] ) ) ; 


TO qnmf [ i ] , 
TO zlO [i] , 


monitor: MODULE = 

BEGIN 

INPUT c: ARRAY Node OF BOOLEAN 

INPUT timeAdvance: Time 

OUTPUT flag: BOOLEAN 

INITIALIZATION flag = FALSE 
TRANSITION 
[ 

timeAdvance' > 0 AND FORALL (i:Node) : ( c [ 1 ] = c[l]) --> 

flag' = TRUE 

[] 

timeAdvance' > 0 AND EXISTS (i:Node): NOT(c[i] = c[l]) — > 

flag' = FALSE 

[] 

ELSE — > 


END; 


system_monitor : MODULE = system | | monitor ; 

See properties in Figure 11 
END 


Figure 10. Draper Clock Phase Locking in SAL (Paid 2) 
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variable. Thus, the calculator produces the inputs needed by each clock element. The calculator 
module receives the values of all (non-faulty) clock elements as input and the time remaining in the one- 
shots of all the clock elements. A key feature of this model is that we do not have a module instance for the 
faulty clock. We assume exactly one faulty clock in the SAL model, but the model can be easily extended 
so that it has / faulty clocks. 

The calculator models the Byzantine nature of the faulty clock — each of the good clock elements 
can see a different value for the faulty clock. The calculator module works as follows: 

• It first computes the values of the quorum function for each of the non-faulty clock elements. 

• If there is leading or falling edge on any of the quorum values, calculator sets timeAdvance to 
zero. Otherwise, it sets timeAdvance to the minimum time remaining on any of the one-shots. The 
min function in Figure 6 is used for this purpose. Thus, the calculator ensures that all microsteps 
arc “completed” before the next “macrostep” is initiated. 

• The Byzantine-faulty clock is allowed to non-deterministically pick the N clock values it wishes to 
show to the N non-faulty clocks, but only when timeAdvance is positive. This essentially disallows 
the Byzantine faulty clock from changing its value indiscriminately - and an arbitrary number of times 
- in the same abstract time instant. 

This last point is crucial for proving correctness of the Draper clocking system. 

3.3 Modeling Decisions 

In the actual Draper clocking system, when one clock changes state, all the other clocks will asynchronously 
observe this event. They may not all see the change at the same time instant, but they will definitely see 
it within a bounded amount of time. Modeling this behavior exactly in SAL is possible, but leads to very 
complex formal models — one clock changes state and some clock does not “observe” the change in the 
same SAL transition. In order to guarantee that every clock does observe all such clock change events, one 
would have to maintain a stack of pending events. 

In our formalization, we model events that occur within some bounded time of each other (microsteps) 
occurring in the same state transition. Thus, in our model, when a clock changes state, all the clocks observe 
this change and respond to it. This way we do not have to store a stack of pending events. As a result, we 
arc potentially not capturing all possible behaviors of the concrete system. However, we believe that the we 
arc not missing too many behaviors of the concrete system, since the order in which events arc processed by 
different clock elements (order of the microsteps) does not seem to matter for determining the final state of 
the system. 

Finally, the paper by Daly et al. [1] does not provide complete details of the clocking circuit and some 
aspects arc ambiguous. For example, it does not fully specify the behavior of the one-shot multivibrators and 
the flip-flops. In such cases of ambiguity, we made certain assumptions while building the formal model. 

3.4 Verification results 

Figure 1 1 lists some properties of the clocking system. Of particular' interest is property p 9 which states 
that eventually the clocks are synchronized. For the above SAL model, the SAL symbolic model checker, 
sal-smc, was able to analyze all the properties. All the properties listed are true, except p3 and p4 (which 
were used to debug the model). 

An important observation coming from our verification effort was that it is important to limit the Byzan- 
tine faulty clock so that it can not send different clock values to the same clock very frequently. Without this 
assumption, it is not possible to prove that the system will always eventually reach a state of phase locking. 
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pi : LEMMA 

system | — F ( G ( (FORALL ( i : [ 1 . . N] ) : c[i] = c[l])) ); 

p2 : LEMMA 

system |- F ( ((FORALL (i: [1. -N] ) : c[i] = c [ 1 ] ) ) ); 

p3 : LEMMA 

system - G ( timeAdvance = 0 ) ; 
p4 : LEMMA 


system | - 

G ( 

EXISTS 

(i: 

[1. 

• N] ) : 

zll [i] 

< tMin ) ; 

p5 : LEMMA 








system | - 

G ( 

EXISTS 

(i: 

[1. 

• N] ) : 

zlO [i] 

> tMin ) ; 

p6 : LEMMA 








system | - 

G ( 

EXISTS 

(i: 

[1. 

• N] ) : 





NOT (2 

:10 [ 

i] 

= -1) 

OR NOT 

(zll [1] = 

p7 : LEMMA 








system | - 

F 

( ( (FORALL 

(i: 

[1. -N] 

) : c [ i ; 

= c [1] ) ) 


p8 : LEMMA 

system_monitor |- F(G ( flag )); 
p9 : LEMMA 

system | — F ( G ( ((FORALL (i:[l..N]): c[i] = c[l])) ) ); 

plO : LEMMA 

system |- G (F (timeAdvance > 0)); 
pi 1 : LEMMA 

system |- G(F(EXISTS ( i : [ 1 . . N] ) : 

(NOT ( z 1 0 [ i ] = -1) OR NOT ( zl 1 [ i ] = -1)))); 

pi 2: LEMMA 

system |- G ((FORALL ( i : [1 . . N ] ) : c[i] = c[l]) => 

G ( (FORALL ( i : [ 1 . . N] ) : c[i] = c[l])) ); 


Figure 11. Correctness Properties for the SAL model of the Draper Clock Phase Locking System 
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Another option would be to not limit the Byzantine faulty clock, but include an extra check in each clock 
element that ignores inputs from a clock if they arrive too frequently. Thus, each clock element is required 
to perform some form of faulty clock detection. We believe it is not possible to achieve clock synchro- 
nization using purely quorum sensing and without any form of fault detection/isolation. Such monitoring 
mechanisms arc employed in the self-stabilizing Byzantine-fault-tolerant clock synchronization protocols 
proposed by Malekpour [3,4], Note that the clocking system described by Daly et al. [1] does not include 
any provision for such fault isolation. 


4 Conclusion 

We have presented two possible methods for modeling and verifying asynchronous fault-tolerant systems 
using the SAL tool chains. 

Our first case study is an asynchronous mid-value select module, which is paid of a larger fault-tolerant, 
asynchronous system. In such systems, different functional modules arc driven by local clocks that arc not 
synchronized. They sample input at slightly different points in time, thus possibly producing different out- 
put, which have to be combined and reconciled in some way to drive actuators. The mid- value select module 
we have described is a common approach for combining three asynchronously produced input into a single 
output, while providing support for fault tolerance. For analyzing such a module in a larger context (that 
may involve continuous dynamics), we aim to represent the mid-value select mechanism in Hybrid SAL. We 
have proposed an abstraction approach that approximates asynchronous sampling by the non-deterministic 
selection of state values in an interval. This approach can be automated by modifying the timed-relational 
abstraction algorithm implemented by Hybrid SAL, so that it considers time intervals rather than time points. 

Our second case study is the Byzantine fault-tolerant clock system presented in [1], We modeled this 
system by using a discrete-time approximation. The resulting model is a standard SAL model (with no 
continuous dynamics), that has a finite state space. As a result, the system can be automatically verified 
using a classical BDD-based model checker. One key property discovered in this case study was the need to 
restrict the fault model so that the value of Byzantine clocks do not change too fast. Under this assumption, 
we have proved all key properties of the protocol: phase-locked synchronization is eventually reached from 
an arbitrary initial configuration of the clocks, and the non-faulty clocks remain in phase lock from then on. 
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